AITopics | high fidelity speech synthesis

Collaborating Authors

high fidelity speech synthesis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Neural Information Processing SystemsDec-24-2025, 14:12:16 GMT

Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU.

generative adversarial network, hifi-gan, high fidelity speech synthesis, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Review for NeurIPS paper: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Neural Information Processing SystemsMay-31-2025, 19:13:47 GMT

Strengths: (1) The paper proposes a new model named HiFi-GAN for efficient and high-fidelity raw waveform generation from mel-spectrogram. In addition to the existing Multi-Scale Discriminator (MSD), the discriminator also consists of a set of small sub-discriminators (called Multi-Period Discriminator, MPD). Each MPD handles a portion of periodic signals of input audio to capture the diverse periodic patterns underlying in the audio data.

generative adversarial network, hifi-gan, high fidelity speech synthesis, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Review for NeurIPS paper: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Neural Information Processing SystemsMay-31-2025, 19:13:40 GMT

This work initially received mixed reviews, but after the author feedback cleared up a misunderstanding, most reviewers are now recommending acceptance. Nevertheless, I think R2 (who has not raised their score) has some valid concerns, which I want to account for in my decision. I have decided to recommend acceptance. The experimental section of this work is fairly comprehensive, and adequately demonstrates that the proposed architecture is effective. However, it is important to point out that the majority of experiments was conducted using ground-truth mel-spectrogram conditioning, which does not match the usual practical setting of TTS systems, where the spectrograms are themselves generated by a model (and thus imperfect).

author feedback, generative adversarial network, high fidelity speech synthesis, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Neural Information Processing SystemsOct-11-2024, 07:31:08 GMT

generative adversarial network, hifi-gan, high fidelity speech synthesis, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback